Search CORE

29 research outputs found

Privacy and Transparency in Graph Machine Learning: A Unified Perspective

Author: Khosla Megha
Publication venue
Publication date: 01/01/2022
Field of study

Graph Machine Learning (GraphML), whereby classical machine learning is generalized to irregular graph domains, has enjoyed a recent renaissance, leading to a dizzying array of models and their applications in several domains. With its growing applicability to sensitive domains and regulations by government agencies for trustworthy AI systems, researchers have started looking into the issues of transparency and privacy of graph learning. However, these topics have been mainly investigated independently. In this position paper, we provide a unified perspective on the interplay of privacy and transparency in GraphML

arXiv.org e-Print Archive

TU Delft Repository

User Fairness in Recommender Systems

Author: Anand Avishek
Khosla Megha
Leonhardt Jurek
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Recent works in recommendation systems have focused on diversity in recommendations as an important aspect of recommendation quality. In this work we argue that the post-processing algorithms aimed at only improving diversity among recommendations lead to discrimination among the users. We introduce the notion of user fairness which has been overlooked in literature so far and propose measures to quantify it. Our experiments on two diversification algorithms show that an increase in aggregate diversity results in increased disparity among the users

arXiv.org e-Print Archive

Crossref

Boilerplate Removal using a Neural Sequence Labeling Model

Author: Anand Avishek
Khosla Megha
Leonhardt Jurek
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/04/2020
Field of study

The extraction of main content from web pages is an important task for numerous applications, ranging from usability aspects, like reader views for news articles in web browsers, to information retrieval or natural language processing. Existing approaches are lacking as they rely on large amounts of hand-crafted features for classification. This results in models that are tailored to a specific distribution of web pages, e.g. from a certain time frame, but lack in generalization power. We propose a neural sequence labeling model that does not rely on any hand-crafted features but takes only the HTML tags and words that appear in a web page as input. This allows us to present a browser extension which highlights the content of arbitrary web pages directly within the browser using our model. In addition, we create a new, more current dataset to show that our model is able to adapt to changes in the structure of web pages and outperform the state-of-the-art model.Comment: WWW20 Demo pape

arXiv.org e-Print Archive

Crossref

The Multiple-orientability Thresholds for Random Hypergraphs

Author: Fountoulakis Nikolaos
Khosla Megha
Panagiotou Konstantinos
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 26/09/2013
Field of study

k

-uniform hypergraph

H = (V, E)

is called

\ell

-orientable, if there is an assignment of each edge

e\in E

to one of its vertices

v\in e

such that no vertex is assigned more than

\ell

edges. Let

H_{n,m,k}

be a hypergraph, drawn uniformly at random from the set of all

k

-uniform hypergraphs with

n

vertices and

m

edges. In this paper we establish the threshold for the

\ell

-orientability of

H_{n,m,k}

for all

k\ge 3

and

\ell \ge 2

, i.e., we determine a critical quantity

c_{k, \ell}^*

such that with probability

1-o(1)

the graph

H_{n,cn,k}

has an

\ell

-orientation if

c c_{k, \ell}^*

. Our result has various applications including sharp load thresholds for cuckoo hashing, load balancing with guaranteed maximum load, and massive parallel access to hard disk arrays.Comment: An extended abstract appeared in the proceedings of SODA 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Joint learning from multiple information sources for biological problems

Author: Dong Thi Ngan
Khosla Megha
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität
Publication date: 01/01/2023
Field of study

Thanks to technological advancements, more and more biological data havebeen generated in recent years. Data availability offers unprecedented opportunities to look at the same problem from multiple aspects. It also unveils a more global view of the problem that takes into account the intricated inter-play between the involved molecules/entities. Nevertheless, biological datasets are biased, limited in quantity, and contain many false-positive samples. Such challenges often drastically downgrade the performance of a predictive model on unseen data and, thus, limit its applicability in real biological studies. Human learning is a multi-stage process in which we usually start with simple things. Through the accumulated knowledge over time, our cognition ability extends to more complex concepts. Children learn to speak simple words before being able to formulate sentences. Similarly, being able to speak correct sentences supports our learning to speak correct and meaningful paragraphs, etc. Generally, knowledge acquired from related learning tasks would help boost our learning capability in the current task. Motivated by such a phenomenon, in this thesis, we study supervised machine learning models for bioinformatics problems that can improve their performance through exploiting multiple related knowledge sources. More specifically, we concern with ways to enrich the supervised models’ knowledge base with publicly available related data to enhance the computational models’ prediction performance. Our work shares commonality with existing works in multimodal learning, multi-task learning, and transfer learning. Nevertheless, there are certain differences in some cases. Besides the proposed architectures, we present large-scale experiment setups with consensus evaluation metrics along with the creation and release of large datasets to showcase our approaches’ superiority. Moreover, we add case studies with detailed analyses in which we place no simplified assumptions to demonstrate the systems’ utilities in realistic application scenarios. Finally, we develop and make available an easy-to-use website for non-expert users to query the model’s generated prediction results to facilitate field experts’ assessments and adaptation. We believe that our work serves as one of the first steps in bridging the gap between “Computer Science” and “Biology” that will open a new era of fruitful collaboration between computer scientists and biological field experts

Institutionelles Repositorium der Leibniz Universität Hannover

Multiple choice allocations with small maximum loads

Author: Khosla Megha
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2014
Field of study

The idea of using multiple choices to improve allocation schemes is now well understood and is often illustrated by the following example. Suppose

n

balls are allocated to

n

bins with each ball choosing a bin independently and uniformly at random. The \emph{maximum load}, or the number of balls in the most loaded bin, will then be approximately

\log n \over \log \log n

with high probability. Suppose now the balls are allocated sequentially by placing a ball in the least loaded bin among the

k\ge 2

bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this scenario, the maximum load drops to

{\log \log n \over \log k} +\Theta(1)

, with high probability, which is an exponential improvement over the previous case. In this thesis we investigate multiple choice allocations from a slightly different perspective. Instead of minimizing the maximum load, we fix the bin capacities and focus on maximizing the number of balls that can be allocated without overloading any bin. In the process that we consider we have

m=\lfloor cn \rfloor

balls and

n

bins. Each ball chooses

k

bins independently and uniformly at random. \emph{Is it possible to assign each ball to one of its choices such that the no bin receives more than

\ell

balls?} For all

k\ge 3

and

\ell\ge 2

we give a critical value,

c_{k,\ell}^*

, such that when

cc_{k,\ell}^*

this is not the case. In case such an allocation exists, \emph{how quickly can we find it?} Previous work on total allocation time for case

k\ge 3

and

\ell=1

has analyzed a \emph{breadth first strategy} which is shown to be linear only in expectation. We give a simple and efficient algorithm which we also call \emph{local search allocation}(LSA) to find an allocation for all

k\ge 3

and

\ell=1

. Provided the number of balls are below (but arbitrarily close to) the theoretical achievable load threshold, we give a \emph{linear} bound for the total allocation time that holds with high probability. We demonstrate, through simulations, an order of magnitude improvement for total and maximum allocation times when compared to the state of the art method. Our results find applications in many areas including hashing, load balancing, data management, orientability of random hypergraphs and maximum matchings in a special class of bipartite graphs.Die Idee, mehrere Wahlmöglichkeiten zu benutzen, um Zuordnungsschemas zu verbessern, ist mittlerweile gut verstanden und wird oft mit Hilfe des folgenden Beispiels illustriert: Man nehme an, dass n Kugeln auf n Behälter verteilt werden und jede Kugel unabhängig und gleichverteilt per Zufall ihren Behälter wählt. Die maximale Auslastung, bzw. die Anzahl an Kugeln im meist befüllten Behälter, wird dann mit hoher Wahrscheinlichkeit schätzungsweise

\log n \over \log \log n

sein. Alternativ können die Kugeln sequenziell zugeordnet werden, indem jede Kugel k ≥ 2 Behälter unabhängig und gleichverteilt zufällig auswählt und in dem am wenigsten befüllten dieser k Behälter platziert wird. Azar, Broder, Karlin, and Upfal haben gezeigt, dass in diesem Szenario die maximale Auslastung mit hoher Wahrscheinlichkeit auf

{\log \log n \over \log k} +\Theta(1)

sinkt, was eine exponentielle Verbesserung des vorhergehenden Falls darstellt. In dieser Doktorarbeit untersuchen wir solche Zuteilungschemas von einem etwas anderen Standpunkt. Statt die maximale Last zu minimieren, ﬁxieren wir die Kapazitäten der Behälter und konzentrieren uns auf die Maximierung der Anzahl der Kugeln, die ohne Überlastung eines Behälters zugeteilt werden können. In dem von uns betrachteten Prozess haben wir m = bcnc Kugeln und n Behälter. Jede Kugel wählt unabhängig und gleichverteilt zufällig k Behälter. Ist es möglich, jeder Kugel einen Behälter ihrer Wahl zuzuordnen, so dass kein Behälter mehr als Kugeln erhält? Für alle k ≥ 3 und ≥ 2 geben wir einen kritischen Wert

c _{k,\ell}^*

, an sodass für c c {k,\ell}^*

nicht. Im Falle, dass solch eine Zuordnung existiert, stellt sich die Frage, wie schnell diese gefunden werden kann. Die bisher durchgeführten Arbeiten zur Gesamtzuordnungszeit im Falle k ≥ 3 and

\ell = 1

haben eine Breitensuchstrategie analysiert, welche nur im Erwartungswert linear ist. Wir präsentieren einen einfachen und eﬃzienten Algorithmus, welchen wir local search allocation (LSA) nennen und der Zuteilungen für alle k ≥ 3 und

\ell = 1$ ﬁndet. Sofern die Anzahl der Kugeln unter (aber beliebig nahe an) der theoretisch erreichbaren Lastschwelle ist, zeigen wir eine lineare Schranke für die Gesamtzuordnungszeit, die mit hoher Wahrscheinlichkeit gilt. Anhand von Simulationen demonstrieren wir eine Verbesserung der Gesamt- und Maximalzuordnungszeiten um eine Größenordnung im Vergleich zu anderen aktuellen Methoden. Unsere Ergebnisse ﬁnden Anwendung in vielen Bereichen einschließlich Hashing, Lastbalancierung, Datenmanagement, Orientierbarkeit von zufälligen Hypergraphen und maximale Paarungen in einer speziellen Klasse von bipartiten Graphen

Universaar

Acronym

MPG.PuRe

The Multiple-Orientability Thresholds for Random Hypergraphs

Author: Fountoulakis Nikolaos
Khosla Megha
Panagiotou Konstantinos
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 20/06/2015
Field of study

A k-uniform hypergraph H = (V, E) is called l-orientable if there is an assignment of each edge e is an element of E to one of its vertices v is an element of e such that no vertex is assigned more than l edges. Let H-n,H-m,H-k be a hypergraph, drawn uniformly at random from the set of all k-uniform hypergraphs with n vertices and m edges. In this paper we establish the threshold for the l-orientability of H-n,H-m,H-k for all k >= 3 and l >= 2, that is, we determine a critical quantity c(*)k,l such that with probability 1-o(1) the graph H-n,H-cn,(k) has an l-orientation if c c(k,l)(*) . Our result has various applications, including sharp load thresholds for cuckoo hashing, load balancing with guaranteed maximum load, and massive parallel access to hard disk arrays

CiteSeerX

University of Birmingham Research Portal

Open Access LMU

MPG.PuRe